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OPTIMAL RATES OF CONVERGENCE FOR CONVEX SET 
ESTIMATION FROM SUPPORT FUNCTIONS 

By Adityanand Guntuboyina 

J>> University of Pennsylvania 

We present a minimax optimal solution to the problem of esti- 
mating a compact, convex set from finitely many noisy measurements 
of its support function. The solution is based on appropriate regular- 
izations of the least squares estimator. Both fixed and random designs 
are considered. 

1. Introduction. We study the nonparametric estimation problem of es- 
timating a compact, convex set in Euclidean space from noisy support 
function measurements. The support function hx of a compact, convex 
subset K of M'^ {d > 2) is defined for u in the unit sphere, S'^~^ := {x S 
R"^ : xf H h x^ = 1} by 

hxiu) := sup(x • u) where x ■ u:= xiui + • • • + XdUd- 

The function hx is called the support function of K because it provides 
information on support hyperplanes and halfspaces of K. Indeed, every sup- 
port halfspace of K is of the form {x:x-u< hj{:{u)} for some u G S*^"^ and 
since K equals the intersection of all its support halfspaces, the function hx 
uniquely determines K. For a proof of this and other elementary properties 
of the support function, see Schneider [26], Section 1.7, or Rockafellar [25], 
Section 13. 

We consider the problem of estimating an unknown compact, convex set K 
from observations (wi, Yi), . . . , {un,Yn) drawn according to the model 

(1) Yi = hKiui)+S,i for i = 1, . . . ,71, 

where ui,...,ii„ are unit vectors and Ci5---)^n independent normally 
distributed random variables with mean zero and variance cr^. We work 
with both fixed and random-design settings for ui, . . . ,Un- 
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Direct motivation for a theoretical study of this problem comes from appli- 
cations. For example, Prince and Willsky [24], who were the first to propose 
the regression model (1) for this problem, were motivated by an applica- 
tion to Computed Tomography. Lele, Kulkarni and Willsky [19] showed how 
solutions to this problem can be applied to target reconstruction from re- 
solved laser-radar measurements in the presence of registration errors. Gre- 
gor and Rannou [13] considered applications to Projection Magnetic Res- 
onance Imaging. Another application domain where this problem might 
plausibly arise is robotic tactile sensing as has been suggested by Prince 
and Willsky [24] . Under an observation model that is different from the one 
considered here, Goldenshluger and Zeevi [12] studied the estimation of the 
support function of the convex support of an unknown intensity function in 
the context of image analysis. 

Additional reasons for analyzing this estimation problem arise from the 
fact that it has a similar flavor to other well-studied regression problems: 

1. It is essentially a nonparametric function estimation problem where the 
true function is assumed to be the support function of a compact, convex set, 
that is, there is an implicit convexity-based constraint on the true regression 
function. Regression and density estimation problems with explicit such con- 
straints, for example, log-concave density estimation and convex regression, 
have received much attention. Some examples of work in this general area 
include Balabdaoui and Wellner [2], Balabdaoui, Rufibach and Wellner [1], 
Cule, Samworth and Stewart [7], Diimbgen and Rufibach [8], Groeneboom, 
Jongbloed and Wellner [14, 15], Mammen [20], Seijo and Sen [27]. 

2. Our model Yi = maXx^Kix ■ ui) + E,i can also be viewed as a variant of 
the usual linear regression model where the dependent variable is modeled as 
the maximum of linear combinations of the explanatory variables over a set 
of parameter values and where the interest lies in estimating the convex hull 
of the set of parameters. While we do not know if this maximum regression 
model has been used outside the context of convex set estimation, the idea 
of combining linear functions of independent variables into nonlinear algo- 
rithmic prediction models for the response variable is familiar (as in neural 
networks) . 

Let us now briefly describe the previous work on this problem. The least 
squares estimator has been the most commonly used. It is defined as 

n 

(2) i^is:=argmin^(yi-/ii(ui))^, 

^ i=i 

where the minimum is taken over all compact, convex subsets L. The mini- 
mizer here is not unique and one can always take it to be a polytope (convex 
set with finitely many corners; more carefully defined in the next section). 
This estimator, for d = 2, was first proposed by Prince and Willsky [24], 
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who assumed that ui,. . . ,Un are evenly spaced on the unit circle and that 
the error variables .^i , . . . , are normal with zero mean. They also pro- 
posed an algorithm for computing it based on quadratic programming. Lele, 
Kulkarni and Willsky [19] extended this algorithm to include the case of 
nonevenly spaced two-dimensional ui, . . . ,m„ as well. Recently, Gardner and 
Kiderlen [10] proposed an algorithm for computing a minimizer of the least 
squares criterion for every dimension d>2 and every sequence «i, . . . , u„. 

In addition to the least squares estimator, Prince and Willsky [24] and 
Lele, Kulkarni and Willsky [19] also proposed estimators (in the case d = 2) 
designed to take advantage of certain forms of prior knowledge, when avail- 
able, about the true compact, convex set. These estimators are all based on 
a least squares minimization. 

Fisher, Hall, Turlach and Watson [9] proposed estimators for d = 2 that 
are not based on the least squares criterion. They made smoothness assump- 
tions on the true support function (viewed as a function on the unit circle 
or on the interval (— 7r,7r]) and estimated it using periodic versions of stan- 
dard nonparametric regression techniques such as local regression, kernel 
smoothing and splines. They suggested a way to convert the estimator of hx 
into an estimator for K using a formula, which works for smooth , for the 
boundary of K in terms of Kk- Hall and Turlach [18] added a corner- finding 
technique to the method of Fisher et al. [9] to estimate two-dimensional 
convex sets with certain types of corners. 

There are relatively fewer theoretical results in the literature. Fisher et 
al. [9], Theorem 4.1, stated a theorem without proof which appears to imply 
consistency and certain rates of convergence for their estimator under certain 
smoothness assumptions on the true support function. Gardner, Kiderlen 
and Milanfar [11] proved consistency of the least squares estimator and also 
derived rates of convergence. They worked with the following assumptions: 

1. ui,ii2, . . . are deterministic satisfying 

max mm. 'Qdiu.Ui) = 0{n~^f^'^~^^) asn— )-oo, 

„gS'd-il<i<n 

where 0^ denotes the usual Euclidean distance on [see (8)]. 

2- £,i,^2-, - ■ ■ are independently distributed according to the normal distri- 
bution with mean zero and variance cr^. 

3. K is contained in the ball of radius F centered at the origin with 

F>al5/2. 

For the loss function £j defined in (5) below, Gardner et al. [11], Corol- 
lary 5.7, showed that l'^^{K,K\s) = Od^t^^rWn) as n approaches oo almost 
surely, where 

^^-i/{d+3)^ when (i = 2, 3, 4, 

(3) /3n := < n~^/^(logn)^, whend = 5, 

[^~2/{d~i)^ when(i>6. 
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Here Od^(j,r is the usual big-0 notation where the constant involved depends 
on d,a and T. Gardner et al. [11], Corollary 5.7, provided explicit expres- 
sions for the dependence of the constant with respect to a and T (but not d) 
which we have suppressed here because our interest only lies in the depen- 
dence on n. Also see Gardner et al. [11], Lemma 3.2, for implications of 
Assumption 1 on the unit vector sequence {ui}. 

Such a strange dependence of the rates of convergence of the least squares 
estimation on dimension has also been observed in other situations (see, e.g., 
Birge and Massart [3], van de Geer [29], Seregin and Wellner [28]). 

Our results in this paper, described below, imply that the rates (3) proved 
by Gardner et al. [11] for the least squares estimator are optimal when d < 4 
and suboptimal when d> 5. We show how estimators can be constructed 
that converge at the rate n~^^^'^~^^^ for all dimensions d>2. Our estimators 
are based on regularizing Kis by minimizing the least squares criterion on 
certain well-chosen subsets of the parameter space. In contrast to Gardner 
et al. [11], we took the more customary approach in nonparametric statistics 
by proving rates for the expected loss or risk instead of almost sure bounds 
for the loss. An advantage is that this results in bounds for a finite (though 
large) n thereby circumventing the need to let n approach infinity. 

We establish an optimal minimax theory for this problem in both fixed- 
design and random-design frameworks. 

In the fixed-design framework, we assume that ui,...,Un are determin- 
istic. We define the minimax risk in this setting as (the subscript / below 
stands for fixed-design): 

(4) Rf{n) = Rf{n]a,r):=mi sup EKi}{K,K) 
with 

1 " 

(5) l}iK,K') := -y^{hK{ui) - hK'{ui))\ 

i=l 

where /C'^(r) denotes the set of all compact, convex sets contained in the ball 
of radius T centered at the origin and Ex denotes expectation taken when 
the true compact, convex set equals K. We assume that a and T are known. 
The infimum in the definition of Rf{n) is over all possible estimators K 
where estimators are defined to be functions of (ui, Yi), . . . , {un,Yn) as well 
as of a and T taking values in the space of all compact, convex sets. 

For every deterministic set of unit vectors ui, . . . ,Un, we show that Rf{n) 
is bounded from above by a constant (which is independent of n) multiple 
of n~^/^'^^'^\ Under a specific assumption on ui, . . . ,Un, we also prove that 
a constant multiple of n~^/('^+^) is a lower bound for Rf{n). The upper bound 
is proved by considering least squares estimators on appropriate subsets of 
the set of all compact, convex sets in M"'. The lower bound is proved by the 
application of Assouad's lemma to a special finite collection of convex sets. 
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We also study the random-design setting where we assume that ui , . . . , n„ 
are independently distributed according to a fixed probability measure, v, 
on S"^~^ and that the errors ^i, • • • , are independent of ui, . . . , Here we 
define the minimax risk as (the subscript r below stands for random design): 

(6) Rr{n)=Rn{n;a,T):=\ni sup ¥.Kil{K,k) 
with 

(7) el{K,K'):= [ {hK{u)-hK'{u)fdv{u). 

For every probability measure u on S"^~^, we show that Rr{n) is bounded 
from above by a constant (which is independent on n) multiple of n~^^^'^~^^\ 
The proof techniques here are similar to the fixed-design setting. When u 
equals funif; the uniform probability measure on S'^~^, we prove that a con- 
stant multiple of also a lower bound for Rr{n). We use a different 
lower bound proof here from the one used in the fixed-design setting. 

We would like to remark here that the rate quite natural in 

connection to the minimax estimation of smooth functions. Indeed, the unit 
sphere has dimension d — 1, and the class of smooth functions on a space of 
dimension d — 1 with smoothness 7 allows the minimax rate ?i~27/(27+d-i) ^ 
Our problem here has a convexity constraint and convexity is associated, in 
a broad sense, with the smoothness 7 = 2, which explains the rate n~'^^^'^~^^^ . 

After setting up the necessary notation in the next section, we prove the 
fixed-design bounds in Section 3 and the random-design bounds in Section 4. 
Some auxiliary results that are needed for the proofs of the main theorems 
are collected in the three Appendices. 

2. Notation. This section will fix notation and introduce some standard 
notions that are used in the paper. 

¥k denotes the probability distribution of the observations when the true 
compact, convex set equals K. In other words, ¥k is the joint distribu- 
tion of (li, . . . ,Yn) in the fixed-design setting and the joint distribution of 
(ui, Yi ),..., (n„, y„) in the random-design setting. We use the same nota- 
tion in both cases as the setting will be clear from the context. Expectation 
under ¥k is denoted by E/^. 

For a real-valued function / on S'^~^ x ••■ x S'^~^, let E,y/(ui, . . . , u.„) 
denote expectation taken under the assumption that ui,. . . ,Un are indepen- 
dently distributed according to v. 

We denote the usual Euclidean distance on M'^ by d^, that is, for x = 
{xi,...,Xd) and y = {yi,...,yd) in M'^, 

(8) M^,y)--=(Y,{xi-yif 

\i=i 
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The closed ball in M"^ of radius a > centered at a € M'^ is denoted by 
Bd{a, a), that is, Ba{a, a) := {x G M'' : dd{x, a) < a}. 

The uniform probability measure on S"^~^ is denoted by funif- 
The convex hull of a subset 5 of M*^, defined as the intersection of all con- 
vex sets containing S, is denoted by conv(S'). Convex hulls of finite subsets 
of M*^ are called polytopes. The set of all polytopes in M'^ with at most m 
extreme points (corners) is denoted by Vm, that is, 

'Pm '■= {conv(S') : S" C M'^ with cardinality at most m}. 

For r > 0, let 'Pm(r) denote the set of all polytopes in Vm that are contained 
inSrf(0,r): 

(9) V.m{T):={K^Vm:K<^Bd{^,T)}. 

For two compact, convex subsets K and K' of M*^, the Hausdorff distance 
between them is defined as 

(10) Ih{K,K'):= sup \hK{u)-hK'{u)\. 

It is apparent that both £'j:(K,K') and i'^(K,K') are less than or equal to 
£'jj{K, K'). The Hausdorff distance has the following alternative expression: 

(11) £h{K,K') =max( sup inf Drf(x,y), sup mfdd{x,y)). 

A simple proof of the equivalence of (10) and (11) can be found in Schnei- 
der [26], Theorem 1.8.11. 

The standard notions of packing and covering numbers will be frequently 
used and we have collected their definitions below for the convenience of the 
reader. Let G be an arbitrary set and let p be a nonnegative function on 
O X (p does not necessarily have to be a metric). 

1. Packing: By an ry-packing subset of (0,p), we mean a subset 5 C 
for which p{6^6') > t] for all 9,9' £ S with 9 ^ 9'. By a maximal ?7-packing 
subset, we mean an r/-packing subset that is not a proper subset of any other 
r/-packing subset. The packing number A^(0, rj; p) is defined as the maximum 
of the cardinalities of all r/-packing subsets of 0. 

2. Covering: By an e-covering subset of (0,/9), we mean a subset 5 C 
such that minsg5 p(t, s) <£ for every t G 0. The e-covering number M(0, e; p) 
is defined as the minimum of the cardinalities of all e-covering subsets of 0. 

The cardinality of a finite set F is denoted by \F\. 

We use the following notions of distance between probability measures P 
and Q having densities p and q with respect to a common measure p: 

1. Total variation distance: ||P — QHtv •= / {\p ~ ^1/2) dp. 

2. Kullback-Leibler divergence: D{P\\Q) := J plog{p/q) dp if P is abso- 
lutely continuous with respect to Q and oo otherwise. 
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3. Chi-squared divergence: x (P||Q) := /(pVg) d/i - 1 if P is absolutely 
continuous with respect to Q and oo otherwise. We also write xiP^Q) '■= 

ixHm))'^'- 

Pinsker's inequality states 

(12) D{P\\Q) > 2\\P - QIItv for ah probability measures P and Q. 

We use the symbols c, C, etc. to denote positive constants depending only 
on the dimension d. Their value may change with every occurrence. 

3. Fixed design setting. In this section, we assume that ui,...,Un are 
deterministic. The errors £,i,---,£,n are assumed to be independently dis- 
tributed according to the normal distribution with mean zero and vari- 
ance o"^. We consider the loss function i'j [defined in (5)] and prove upper 

and lower bounds for the corresponding minimax risk Rf{n) over IC^iT) [see 
the definition (4)]. 

3.1. Upper bound for Rf{n) . The following result shows that i?j (n) is at 
most to a multiplicative constant that is independent of n. We 

make no assumptions on the deterministic design unit vectors ui , . . . , ti„ and 
they are completely arbitrary. It should be noted that the loss function £j 
is naturally associated with this fixed design setup enabling the following 
theorem to hold with no assumptions whatsoever on ui, . . . , On the other 
hand, such assumptions would be unavoidable if one is interested in proving 
risk bounds for other loss functions under the fixed design setting. A similar 
remark also applies to Theorem 4.1 where the natural loss function is 

Theorem 3.1. There exist positive constants c and C depending only 
on the dimension d such that 

(13) Rf{7i) < ca^nd+s)-p^{d-md+3)^-md+3) > C{a/T)\ 

Proof. For each finite subset F of /C'^(r), let us define the least squares 
estimator Kp by 

n 

(14) KF-=^rgmmy^{Yi-hL{u,)f. 

We show that, if F is chosen appropriately, then E,K£'j:{K, Kp) is bounded 

from above by the right-hand side of (13) for every K G JC^iF). 

Fix K E /C'^(r). We start with the following trivial inequality which holds 
for every nonnegative function G on F and every q > 0: 

G{Kf) < Yl G{L)eKp(aY,iy^ " hj^^Mf - o^Y^^Yi - hL{u,)f\ , 

L&F ^ i i ' 
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the reason being that the term for L = Kp in the sum on the right-hand 
side equals G{Kp). 

Because Kp is the least squares estimator over we can replace it in the 
right-hand side above by an arbitrary L' S F. Taking expectation on both 
sides of the resulting inequality, we obtain 

for every L' € F. The expectation term in the right-hand side equals 



(15) ^KG{kp) < G(L)EKe"^^(^'-'^i' 



exp(-an4(-?^,-^^) + an£}{K,L') + 2a^a'^nl){L,L')). 



We then use the elementary inequality £j{L,L') < 2£j{K,L) + 2ij{K,L') 
for EkG{Kp): 



and the fact that i'i{K,L') < l\{K,L') to obtain the following upper bound 



min V G{L) exp((-a + Wa'^)n£){K, L) + {a + Wa'^)n£]j{K, L')). 
The choices 

G(L)=exp^^^J and a = ^ 
lead to the following risk bound: 

(16) E^exp^^^^j < |F|exp^^ min4(i^,^')), 

where \F\ denotes the cardinality of F. Using Jensen's inequality on the 
left-hand side and taking logarithms on both sides, we deduce 

EKff(K,kp) < ^-loglFl + 3min (.Uk,L'). 
■' n l'gf 

Since K € /C'^(r) was arbitrary, we get 

(17) i?/(n) < i^log|F| +3 sup min £|^(i^, L'). 

K<:iK.d{T)^'^P 

We now use a classical result on the covering numbers of {1C^{T),Ih) due 
to Bronshtein [4], Theorem 3 and Remark 1, which states that there exist 
postive constants c and Eq which depend on d alone such that for every 
e < Feo, there exists a finite subset F C /C'^(r) satisfying 

/pN (d-l)/2 

(18) log|F|<c - and sup min £jj{K, L') < e'^ . 
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Combining (17) and (18), we get 



Rf{n) < 



16c— ( - 1 + 3e 



for every < e < Teq 



If we now choose e := a^/id+3)Y{d-i)/{d+3)^-2/{d+3) ^ ^^^^^ ^ < provided 
n > C{a/T)'^ for a large enough constant C depending only on d and the 
required inequahty (13) fohows. □ 

3.2. Lower bound for Rf{n). We show that also a lower 

bound for Rf{n) up to a multiplicative constant that is independent of n. We 
make the assumption that the fixed unit vectors ui , . . . , u„ form a maximal 
e-packing subset (under the Euclidean metric dd) of S*^"^ for some e G (0, 1]. 
The definition of a maximal packing set was given in Section 2. Note that 
it is impossible to prove the lower bound n~^^^'^^^^ for Rf{n) without any 
assumptions on ui, . . . ,Un- For example, if ui = • • • = then Rf{n) is of 
the order 1/n. 

A standard argument [sketched in the Appendix; see inequality (41)] 
shows that our assumption on ui, . . . , m„ implies that 

(19) ce^-'^ <n< Ce^-'^ 

for two constants c and C depending only on d. The following is the main 
theorem of this section. 

Theorem 3.2. Suppose ui,...,Un form a maximal e-packing subset 
of S'^~^ for some e G (0, 1]. There exist positive constants c and C depending 
only on d such that 

(20) Rf{n) > ca'/(<i+^)r'(<i-md+3)^-4/{d+3)^ 
whenever n >Cmax{{a/r f, {T /aY'^"^^/'^) . 

Our proof is based on the application of Assouad's lemma to an explicitly 
constructed finite subset of JC'^{T). The following version of Assouad's lemma 
is taken from van der Vaart [30], page 347. Recall that Fk denotes the 
probability distribution of the observations when the true compact, convex 
set is K. 

Lemma 3.3 (Assouad). Let m be a positive integer and suppose that, 
for each r € {0,1}™, there is an associated set K{t) in 1C'^{T). Then the 
following inequality holds: 

m ef{K{T),K{T')) 
R^(n) > - mm t(™0=i^^ " W^^ir) -^Kir')\\Ty): 

where T{t,t') := J^d^i /'^il- 
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Proof of Theorem 3.2. We apply Assouad's lemma to the following 
construction. For a fixed positive r/ < 1/8, we consider unit vectors vi,. . . ,Vm 
that form a maximal 2-^/2r7-packing subset of the unit sphere (under dd) and 
define 

K{t) := (ri) n • • • n Dm.{T^) for r G {0, 1}-, 

where 

DiiO):=BdiO,T)n{x:x-Vj<T{l-r])} and A(l) := 5^(0, T). 

One consequence of the assumption on vi,. . . ,Vm is that m > cr]^^~'^^^'^ for 
a constant c. Another consequence is that the sets -6^(0, T) ri{x ■ vj > 1 — i]} 
are disjoint which implies that 

£}{K{r),K{r'))= J] e}{D,{0), D,{1)) = T{r,r')e}{D,{0), D,{1)) 

for every r, r' G {0, 1}™. In Lemma A.l (stated and proved in Appendix A), 
we show that there exist constants c and C such that 



cr^^-^+^V^ < ^2(Z)i(0),Z)i(l)) < CT^r,^ 



^2^(d+3)/2 

provided < < 1/8 and rj > Ce^ . Therefore, 



£^(K(t) K(t')) 

(21) min ^ J/' „ > cr2ri('^+3)/2 if < r? < 1/8 and 77 > Ce^. 

To bound ||PA'(r) ~ ^^^(t') IItVj we use Pinsker's inequality (12) because 
the Kullback-Leibler divergence Z)(P^(^) ||Px(t')) has a simple expression 
in terms of £jiK{T),K{T')): 

\\rKir)-PK(r')fTV < ^D(Pi,(.) ||Pi,(.o) = £2 ^} (t) , K (t' )) 

= £^T(T,r04(A(O),I)i(l)). 
By a second application of Lemma A.l, we obtain 

(22) min (1 - ||P^(,) - P^(,,) ||^^) > 1 - C^^r?('^+3)/^ 

1(t,t') = 1 O 

if <r] < 1/8 and rj > Ce^. Therefore, applying Assouad's lemma with the 
inequalities (21), (22) and m > crj~^'^~^^/'^ , we obtain 

(23) Rf{n) > cTW (l - C7^^r,('^+3)/4^ 
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if < r? < 1/8 and 77 > Ce^. We now make the choice 

Then r] < 1/8 provided n > CiajYf. Also, from (19), e < C-nT^I^'^^^^ and 
thus, for > Ce^, it is enough to ensure that r/ > Cn~'^l^'^~^^ which, upon 
simphfication, reduces to n > C(r/(7)^'^~^-'/^. Inequality (23) with this choice 
of T] then implies (20) which completes the proof. □ 

3.3. A more natural estimator. In Theorem 3.1, we used the least squares 
estimator on an appropriate finite subset of /C'^(r) to prove the optimal upper 
bound for Rf{n). This estimator can be viewed as a regularized version of 
the full least squares estimator Kis := arg min^^ ^ ^ (1^ — hiiui))'^ for which 
the rates [see (3)] proved by Gardner et al. [11] are suboptimal for (i> 5. 

We remarked [just after (2)] that, for the full least squares estimator, 
the set L which minimizes "Yli^Xi ~ hLiui))"^ is not unique. Gardner and 
Kiderlen [10] observed that a minimizer can always be chosen to be a poly- 
tope with at most n extreme points and provided an algorithm for computing 
such a minimizer. In light of this observation of Gardner and Kiderlen [10], 
we consider the following estimator which is a more intuitive regularization 
of K\s compared to Kp'- 

n 

(24) ^^:=argminV(y,-/iL(ni))^ 

LGP™(r) 

The set 7-'m(r) was defined in (9). The best risk achievable by Km is defined 

as 

i?/(n):=inf sup Eft:4(^'^m)- 
m>iA'g/c<i(r) 

It is not too hard to see that Km equals the least squares estimator K\^ 
whenever m>n. On the other hand, for m <n, they can be quite different. 

In this section, we prove the following theorem which shows that Rf{n) 
is bounded from above by to a multiplicative factor that is 

logarithmic in n. No assumptions on ui, . . . ,Un are necessary. 

Theorem 3.4. There exist positive constants c and C that depend only 
on the dimension d such that 

(25) Rf{n) < ca^/id+s)r^id~i)/{d+3)^-m+3) iog{cnT^/a^) log(cn), 
i/n>C7(a/r)2. 
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For the proof of this theorem, we use the following result which is a special 
instance of a result on convergence rates of sieved least squares estimators 
from van de Geer [29], pages 184 and 185. For a polytope P E Vm and w > 0, 
let 

Sm{P,co) := {LeVm-. i){P, L) < UJ^} 

and let M{Sm{P-,^),£',(-f) denote the e-covering number of Sm.{P,^) un- 
der £f. 

Theorem 3.5. Fix a polytope P £ Vmi^)- Suppose ^ is a function on 
(0,oo) such that 

'^{(^)> J \J log A/(5m (P, , e; £/) de for every uj>0 

and such that '^{uj)/uj'^ is decreasing on (0,oo). Then there exists a universal 
constant C such that 

(26) ¥k{£){K.^P) > 6) <cY,^xp(=^) 

.s>0 ^ ^ 

for every 6 satisfying 6 > 8i'j{K,P) and \/n5 > Ca^{^/6) . 

The application of this theorem for the proof of (25) requires an up- 
per bound on M{Sm{P,i^),£',^f)- Such a bound is provided in Lemma B.l 
(stated and proved in Appendix B). 

Proof of Theorem 3.4. Fix m>l and an arbitrary polytope P G 
Vm(^)- In Lemma B.l, we show that 

/ 9 fni bimdlog{b2m) 

M{S^{P,u),e;if)<U + ^ 
for universal positive constants bi and 62- This implies that 



£ ^logMiS^iP,u;),e;if)de < Vbimdlogib2m) £ Jlog(^A + de 



■ uj \/ bimd\og{b2m) j — — — ^ dx 



x^ 



< Cujy mdlog{b2m) log(4 + 2-y/n). 
As a result, the function ^(u) appearing in Theorem 3.5 can be taken to be 



"^{uj) := CujJ md\og{b2m) log(4 + 2^/n) for every a; > 
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and then (26) gives 

(27) FK{£}{Km,P)>S)<CY,^iip(^^^) whenever 5 >5o, 
where 

<5o :=c(^e}{K,P) + ^mdlog{b2m) log(4 + 2^^)^ . 

Because n6o > Ccr^ for a constant C, the sum on the right-hand side of (27) 
can be bounded from above by a constant multiple of the first term, and we 
deduce 

¥K{i}{Km,P) >S)< Cexp(^-^^^ whenever 5 > (Re- 
integrating both sides of the above inequality with respect to 5 G [(^QjOo), 
we get 

Ca2 f-n5o\ ^ Ca^ 



K..(4(^„,P)-5o)^<^exp(^) 



n 

where := max(x,0). Because u^/n < C5o, we get the expectation bound 
EKi}{Km,P) < C6q. The elementary inequality ej{K„i,K) < 2i'j{km,P) + 
2e'j{K,P) and the fact that Jq > t){K,P) yield the following risk bound: 

(28) EKff{krn,K) < C {ff{K, P) + ^md\og{h2m) log(4 + 2^^)^ . 

Since K G /C'^(r) and P G VmiJ") were arbitrary in the above analysis, we 
have proved the following bound for Rf{n): 

,2 



Rf{n) < C inf 



sup inf ijj{K,P)^ m(ilog(62"i)log(4 + 2Vn) 



m>l 

2 



where we have also used £j {K, P) < ijj {K, P) . 

Bronshtein and Ivanov [6] (see also Bronshtein [5] ) proved that there exist 
positive constants Ci and C2 depending only on the dimension d such that 

(29) sup inf £|^(i^,P) <Cir2m-^/('^-^) whenever m > C2 • 
From this result, we have 



Rf{n)<C inf 

m>C2 



V'^m-^/^'^-^^ + — mdlog(62m,) log(4 + 
n 



If we now choose m := a-'^{d-i)/{d+z)Y2{d-i)/{d+z)^(d-i)/{d+^) ^ ^^lan m > C2 
provided n > C{a/T)'^ for a large enough constant C depending only on d 
and the required inequality (25) follows. □ 
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Remark 3.1. From the above proof, it can be seen that Theorem 3.4 
also holds for the following estimator: 



km := argmin ^^(yi - hL{ui)f 



The only difference between Km and Km is that the argmin in the definition 
of Km is taken over all polytopes in Vm while that in the definition of Km 
is only over those polytopes in Vm that are contained in the ball of radius T 
centered at the origin. Theorem 3.4 also holds for Km because we have not 
used this boundedness property of sets in VmiX) in our covering number 
calculations in Lemma B.l. 

Remark 3.2. The above proof also reveals that the value of m for 
which Km achieves the optimal rate up to logarithmic terms is of the order 
^(d-i)/(d+3)^ Since this is much smaller than n, the estimator Km for this m 
is quite different from the full least squares estimator Kis. 

4. Random design setting. In this section, we assume that ui,...,Un 
are independently distributed according to a fixed probability measure on 
the unit sphere, S'^~^. The measurement errors £,i,---,^n are independent 
normal random variables with mean zero and variance <t^. We also assume 
that ^1, . • . , are independent of ui, . . . ,Un- We consider the loss function 
[defined in (7)] and prove upper and lower bounds for the corresponding 
minimax risk Rr{n) [defined in (6)]. 

4.1. Upper hound for Rr{n). The following result is the random-design 
analogue of Theorem 3.1. We show that Rr{n) is bounded from above by 
to multiplication by a constant that is independent of n. No 
assumptions on are required and it is completely arbitrary. 

Theorem 4.1. There exist positive constants c and C depending only 
on the dimension d such that 

(30) Rr{n) < ""^^'i^'} ^8/(d+3)r2(d-l)/(d+3)^-4/(d+3)^ 

1 — e ' ' ' 

ifn>Cia/Tf. 

Remark 4.1. Our proof below also shows that if one works with the 
smaller loss function: 

(31) H^iK^K') := -lea^log / exp (- ("^^^ ) d^u) 

instead of then the factor (r^/a^)(l — e~^^^^'^"^^)~^ in the minimax risk 
bound (30) can be removed. 



CONVEX SET ESTIMATION FROM SUPPORT FUNCTIONS 15 

Proof of Theorem 4.1. As in the proof of Theorem 3.1, we consi- 
der the least squares estimator Kp [defined in (14)] over a finite subset F 
of IC^iV) for which inequahty (15), reproduced below, holds for every 
L' gF and a > 0: 

^kG{Kf) < G{L)EKexp(a^{Yi - hL'{ui)f - a^(y, - hL{u,)f) ■ 

L£F ^ i i ^ 

Under the random-design setting, the expectation in the right-hand side 
above equals 

e^^{-anl){K, L) + anl){K, L') + 2a^a'^nl){L, L')), 

where, as explained in Section 2, the expectation is taken under the 
assumption that ni,...,ii„ are independently distributed according to v 
(note that i'j depends on ui,...,n„). Using the inequalities £'j{L,L') < 
2ff{K,L) + 2fj{K,L') and £}{K,L') < i]j{K,L'), we obtain the following 
upper bound for '&kG{Kf): 

exp((Q + Wa'^)n£]j{K, L'))G{L)E^ exp((-Q + Aa^a^)n£}{K, L)) 

LeF 

for every L' € F. It may be helpful to note that i'jj{K,L') does not depend 
on ui,...,Un and is nonrandom. We apply this inequality to the following 
choices of G and a: 

G{L) = (^E,exp(^ ^^-^ J J and a = 

A straightforward calculation reveals that this function G{L) has the fol- 
lowing alternative expression: 

G(L)=exp(^£Lw(^,^)), 

where i'^^^ is defined as in (31). Specializing the upper bound for KkG{Kf) 
to these choices of G and a, we deduce 

This is the same inequality as (16) with the loss function £j replaced by 
Thus, following the same steps as in the proof of Theorem 3.1, we deduce 
the existence of positive constants c and C depending only on d such that 

(32) i?ncw(n) < ccT8/('^+3)r2('^-i)/('^+3)n-4/("'+3) if n > C(a/r)2, 
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where 

-Rncw(n) := inf sup EKilc^iK,k). 

This proves the claim made in Remark 4.1. We now give a simple inequality 
relating the loss functions £'^i^^{K, K') and £'^{K,K') which enables us to 
convert this bound for R^icwin) into the required inequality (30) for Rr{n). 
For every K e JC^ir) and u G we have \hK{u)\ < F. Therefore, 

Since the convex function x i— )• lies below the chord joining the points 
(0,1) and (rV(4a2),exp(-rV(4fj2))), we have 

e-^ < 1 + ^(e-r'/(^"') - l)x for < x < ry{Aa^). 

Using this with x = {hK{u) — /ix'(tt))^/(16a"^) and integrating both sides of 
the resulting expression with respect to z^, we get 

Taking logarithms on both sides, we obtain 

, f ( {hK{u)-hK'{u)f \ ^ , ^ e-rV(4<x^)_l 

y "^p(- le/ J '^^("^ ^ 4f2 

where, on the right-hand side, we have used log(l + y) <y. The above in- 
equality can be rewritten as 

f(K K') < _JlIi^^D_f (K K') 

The proof is complete because the required bound (30) follows by combining 
the above inequality with (32). □ 

4.2. Lower bound for Rr{n) . The following theorem is the random-design 
analogue of Theorem 3.2. We assume that i/ = i^unif is the uniform probability 
measure on S'^~^ and prove that Rr{n) is bounded from below by a constant 
multiple of n~'^^^'^^^\ Note that the lower bound of n~^/^'^^^^ cannot be 
proved for Rr{n) for arbitrary i/. For example, when v is concentrated at 
a single point, Rr{n) is of order 1/n. 

Theorem 4.2. Consider the random-design setting where v equals the 
uniform probability measure Vunii on S'^~^ . Then there exist positive con- 
stants c and C depending only on d such that 

(33) Rr{n) > ca8/('^+3)r2('^-i)/(^+3)n-4/(d+3) whenever n > C{a/r)^. 
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It is possible to prove this theorem by an appropriate modification of 
the proof of Theorem 3.2. We, however, give a different proof using a global 
metric entropy minimax lower bound from Guntuboyina [16]. This proof has 
an interesting implication that is described in Remark 4.2. A version of this 
proof appeared in Guntuboyina [16], Section V, although the result there 
has a different assumption on ui, . . . ,Un and is also slightly less precise. 



Proof of Theorem 4.2. Let ^ := {¥k : K G /C'^(r)}. We use the fol- 
lowing minimax lower bound from Guntuboyina [16], inequality (22): 



for all ry > and e > 0. The notation was set in Section 2: N{lC'^(r),7];£r) 
denotes the maximal ry-packing number of /C'^(r) under the ^^-metric (the 
square root of the loss function ^^) and M{^,£;x) denotes the e-covering 
number of *P when distances are measured by the square root of the chi- 
squared divergence, that is, it is the smallest integer M for which there 
exist probability measures Qi, ■ ■ ■ ,Qm satisfying mmi<i<M (J^WQi) ^ 
for every P G 

The application of (34) requires a lower bound on N {IC^ (T) , rj; £r) and 
an upper bound on M(Cp,e;x)- Guntuboyina [16], Theorem VII. 1, building 
on a result of Bronshtein [4], showed the existence of positive constants r]o 
and c' depending only on d such that 



The above bound uses crucially the fact that v equals funif- It is not true 
for arbitrary probability measures on S'^~^. 

For M(*p,e;x), we note that the chi-squared divergence X^(IP-ftr||IPA:') sat- 
isfies 



As a result, 

X^O^kW^K') < whenever £h{K, K') < e' := a ^/log{TTe^ / y/n 

and M(«p,e;x) < M{lC'^{r),e'-jH)- Upper bound for the covering number 
M{IC'^(r),e']iH) has been proved by Bronshtein [4], Theorem 3 and Re- 



(34) 





whenever rj < TrjQ . 
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mark 1. We already used this result [inequality (18)] in the proof of Theo- 
rem 3.1. We use it again to obtain 

logM(qj,e;x) < c ^^"^ if log(l + ^2) < nT\l/a^ 



for positive constants c and £q. Let us now define 

/r ATX (''-i)/('^+3) 
r?(n):=cia4/('^+3)r(d-i)/(^+3)^-2/(d+3) ,= ([^X 

where ci is a positive constant that depends on d alone and will be specified 
shortly. Also let e^(ri) := exp(a^(re)) — 1. We then have 

log Af(/C'^(r), r/(n); >c'c7^°'"^^/^a2(n) and logM(q3, e(n); x) < ca^H 
provided 

(35) r]{n)<T^Q and {n) < nV'^ el / a"^ . 

Inequality (34) with r/ = 7y(n) and e = e(n) gives the following lower bound 
for Rr{n): 



T] [n) 



1 _ exp(-a2(n)c'q ('^-^)/') - exp (^(1 + c - c'c-^'~'^")^ 



If we choose ci so that c'c^ ^'^ ^^^^ = 2(1 + c), then 

> ( 1 _ 2expf-i±^a2(n; 



If the condition (1 + 0)0^ {n) > 2 log 4 holds, then the above inequality im- 
plies Rr{n) > ry^(n)/8 which yields (33). This condition as well as (35) hold 
provided n > C{a/T)'^ for a large enough C. The proof is complete. □ 

Remark 4.2. In the above proof, the random-design assumption on the 
unit vectors ui, . . . , m„ was used only in 

x\Pk\\Pk') < expf g^^^' M - 1. 

This inequality is easily seen to be true for every joint distribution of (tii, . . . , 
long as they are independent of the errors ^i, - ■ ■ ,£,n- Consequently, 
to multiplicative constants, is a lower bound for the minimax 
risk (observe that the integral below is with respect to the uniform proba- 
bility measure funif): 

inf sup Ek {hxiu) - hj^{u)fu^nil{du) 
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for every arbitrary choice of the design unit vectors (deterministic or ran- 
dom) as long as they are independent of ^i,...,Cn (this independence as- 
sumption is only relevant in a random-design setting). 

4.3. Least squares on polytopes. We prove the random-design analogue 
of Theorem 3.4. Let 

Rr{n):=mi sup ¥.K^l{K,km) 

be the best achievable risk by Km [defined in (24)] in the random-design 
setting under the loss function (1. The following theorem shows that Rr{n) 
is bounded from above by to a multiplicative factor that is 

logarithmic in n. No assumptions on v are necessary. 

Theorem 4.3. There exist positive constants c and C that depend only 
on the dimension d such that 

(36) ii,.(n) <cmax(cj2,r2)n-^/("'+3)(log(cn))2 ifn>C. 

The proof strategy is to use the fixed design bound (25) along with Lem- 
ma C.l (stated and proved in Appendix C) which relates the risks under the 
two loss functions £j and 

Proof of Theorem 4.3. We start with the inequality 

e^XKrn,K) <2{er{Km,K) - 2ef{Kra,K))l + 8e}{krn,K), 

which implies that 

EKf,{Km,K)<2E, sup {£r{L,K)-2£f{L,K))l + 8EKi}{Kn.,K). 

The first expectation in the right-hand side is bounded using Lemma C.l 
where it is shown that 

(37) sup {£r{L,K) -2£f{L,K))l<c—md\og{cn) 

for a universal positive constant c. For the second expectation, we use ideas 
from the proof of Theorem 3.4. Indeed, the same argument which led to the 
inequality (28) gives, for every P G Vm, 

(2 
£}iK,P) + ^mdlog{b2m)log{cn) 

for every ui,. . . ,Un- Taking expectation with respect to ui, . . . , u„ indepen- 
dently distributed according to i^, we get 

(38) EK£}{km,K) < c(^£f.iK, P) + ^mdlog{b2m) log(cn)^ . 
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Putting (37) and (38) together, we obtain 

/ p2 

E,Kil(K^,K) < c el(K,P) H mdlog(b2m)\og(cn) + —mdlog(cn 

\ n n ' 

< c(ejr{K,P) + ^^^^^^i^!il!lmdlog(62m) log(cn) 
\ n 

Because K G /C'^(r) and P G 'Pm(r) are arbitrary, we have shown 

sup inf l\j{K,P)-\ ^ mdlog(62m) log(cn) 

i^G^d(r)PeP™{r) n 

Just as in the proof of Theorem 3.4, we use the result (29) due to Bronshtein 
and Ivanov [6] to get 

r2„,-4/(d-i) ^ ™(^''r^) ^rfiog(62m)log(cn) 



Rr{n) < c inf 

m>l 



iir(n) < C inf 

m>C2 



< Cmax(o■^^^) inf 

m>C2 



n 

2 7^2 



^ 4/(d 1) _| ^ log(62m) log(cn) 



If we now choose m := n^'^~^^ / ^'^'^'^'^ , then m > C2 provided n> C for a large 
enough constant C depending only on d and the required inequality (36) 
follows. □ 



APPENDIX A 

In this section, we shall prove the following result which was used in the 
proof of Theorem 3.2. We assume that ui, . . . ,Un form a maximal e-packing 
subset of S"^~^ for some e G (0, 1]. 

Lemma A.l. For a fixed <rj < 1/8 and a unit vector v, consider the 
following two subsets of the ball 5^(0, T). ■ 

D{0):=BdiO,T)n{x:x-v<l-r]} and D{1) := Bd{0,T). 

Then there exists constants c and C such that the following inequality holds 
whenever r] > Ce^ : 

(39) c?/'^+3)/2 < f^{^D{{)),D{l)) < Cr/('^+3)/2. 

We need some elementary results on spherical caps for the proof of this 
lemma. For a unit vector u and a real number < (5 < 1, consider the spher- 
ical cap S{u]5) := S'^^^ r\Bd{u,5). It can be checked that this spherical cap 
consists of precisely those unit vectors which form an angle of at most a 
with u, where a is related to 5 through 



52 _ 5^/1^ 

cos a = l and sm a = . 

2 2 
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A standard result is that i/unif (^(x; 6)) equals C sin'^~^ t dt (recall that funif 
denotes the uniform probability measure on the unit sphere). This integral 
can be bounded from above and below in the following simple way. For 
a lower bound, we write 

■ ri 9 , ■ ri , sin'^"^ a 

sm"*"^ tdt> sm''"^ t cos t dt > , 

-io " d-l 

and for an upper bound, we note 

JO, / COS t JO, sm Q 
sin'^'^ tdt< / sin'^'^ tdt< 



Jo COS a (d— l)cosa 

We thus have csin'^~^a < i'{S{u;6)) < Csin'^~^Q/ cos a. Writing cos a and 
sin a in terms of 6 and using the assumption that < 5 <1, we obtain that 

(40) cd'^'^ < J^unif (5(n; 6)) < 06'^'^ if 6 e (0, 1]. 

This inequality can be combined with a simple volumetric argument to show 
that 

(41) c6^-'^ <N{S'^-\6;'0d)<C6^-'^ if 5 G (0,1]. 

In particular, since ui,. . . ,Un form a maximal e-packing subset of S"^~^, we 
have (19). 

The following lemma is used in the proof of Lemma A.l. 

Lemma A. 2. Fix positive e, 5 such that (5 < 4/5 and e < 5/2. Let ui, . . . ,Ur, 
be a maximal e -packing subset of the unit sphere and v be an arbitrary unit 
vector. Let V{£, 6) denote the number of points ui, . . . , u„ that are contained 
in S{v,5). Then 

((-\ d~l / r\ d~l 

-J <V{e,6)<C^- 

Proof. For the lower bound on V{e,5), we observe that 
(43) S{v,6/2)C U S{ui,e). 

Indeed, because e < 6/2, for every w G S{v,6/2), we can find Ui such that 
dd{ui,w) < e < 6/2 because ui,...,Un form a maximal e-packing subset 
of S'^~^. Thus Ui E S{v,6) by triangle inequality which proves (43). 

It follows from (43) that V{e,6) > i'^nii{S{v,6/2))/i'unu{S{ui,e)) from 
which the lower bound in (42) follows by use of (40). 

For the upper bound on V{e,6), we use the inequality 

S{v,6 + e/2)^ U S{u^,e/2) 
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and then, noting that the spherical caps on the right-hand side of the 
above inequality are disjoint as ui, . . . ,Un form a e-packing subset, we ob- 
tain V{e,d) < t'unif (^(f , 5 + e/2))/z/unif (5(ui,e/2)). The upper bound in (42) 
again follows from (40). □ 

We are now ready to prove Lemma A.l. 

Proof of Lemma A.l. It can be checked that the support functions 
of D{0) and D{1) differ only for unit vectors in the spherical cap S{v, 
This spherical cap consists of all unit vectors which form an angle of at 
most a with v where cos a = 1 — r/. In fact, if Q denotes the angle between an 
arbitrary unit vector u and it can be verified by elementary trigonometry 
that 

, / \ r I \ f r(l — cos(a — ^)), if < 6* < a, 
(44) /.z>(o)(n)-/^D(i)(-) = |o; otherwise 

As a result, it follows that £^(£'(0), 15(1)) < V'^Tf'Vie, ^f2ji)/n where V is as 
defined in Lemma A.2. Thus (42) gives ^^(L>(0), L>(1)) < C'q'^'^+^'i/'^e^-'^ /n. 
The conditions in Lemma A. 2 are satisfied ifO<77<l/8 and r/ > Ce^ for 
a large enough C . Moreover, by (19), we have c < ne'^~^ < C which implies 
that £2(£)(o),D(l)) < Cr]^d+z)/2_ 

For a lower bound, fix < 6 < 1 and let < /3 < a denote the angle 
for which 1 — cos(a — /3) = hi]. It follows from (44) that the difference in 
the support functions of -D(O) and D{1) is at least bVr] for all unit vec- 
tors in the spherical cap consisting of all unit vectors forming an angle 
of at most P with v. This spherical cap is S{v,t) where t is given by 
t2 := 2(1 - cos/3). Therefore £j{D{0),D{l)) > }?T'^rfV{e,t)/n. The inequal- 
ity < 2(1 — cos a) < 2r] is easily checked. Also, t > sin/3 and sin/? can be 
bounded from below in the following way: 

1 — brj = cos(a — /3) < cos a + sin a sin P <1 — rj + y^2r/sin/3. 
Thus t > sin/3 > (1 — b)y^ri/2 and from (42), it follows that 



ff{DiO),D{l))>cb'^ — '-[- >cb\l-bf- 
■' n \e J 



for all < 6 < 1. Note that we have used 77 > Ce^ here to satisfy the condi- 
tions in Lemma A. 2. We now use (19) and choose 6 = 1/2 to get ^^^(^(O), 

D{1)) > crj^'^^'^^/'^ provided rj > Ce^ . The proof is complete. □ 



APPENDIX B 



In this section, we prove the following lemma which was used in the proof 
of Theorem 3.4. 
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Lemma B.l. Fix m> 1 and oj,e > 0. The following bound holds for 
every P G Vm • 



M{S^{P,uj),e;ef)<{4 + 



bimdlog{b2m) 



e 

where bi and 62 are universal positive constants. 

For the proof of this lemma, we use available techniques for bounding 
covering numbers using combinatorial notions of dimension. Specifically, we 
use the notion of pseudodimension, introduced by Pollard [23], Chapter 4, 
as a generalization of the Vapnik-Cervonenkis dimension to classes of real- 
valued functions. The pseudodimension of a subset A of M" is defined as 
the maximum cardinality of a subset a C {1, . . . ,n} for which there exists 
(hi, . . . , hn) G such that: for every a' C o", one can find (oi, . . . , a„) G A 
with ai < hi for i £ a' and a,, > hi for i £ a\a' . The following theorem is 
a special case of results in Pollard [23], Chapter 4, and gives an upper bound 
for the covering number (with respect to the Euclidean metric) of a subset 
of M" in terms of its pseudodimension. Stronger results of this kind have 
been proved by Mendelson and Vershynin [21] and the following theorem is 
also a special case of Mendelson and Vershynin [21], Theorem 1. 

Theorem B.2. Let A be a subset o/M" with maxj|aj| < B for all a = 
(ai, . . . ,an) £ A. If the pseudodimension of A is at most V , then, for every 
t>0, we have 



t 

where b is a universal positive constant. 

Proof of Lemma B.l. Fix a polytope P G Vm and let xq G M" denote 
the point (/ip(mi), . . . ,hp{un)). Also, for m > 1, let 

Hm :={xGM":x= {hiiui), . . . ,hL{un)) for some LeVm]- 

Clearly 

M{Sm (P, w) , e; = M (S„ (0, V^w) n if™ - , V^e; c)„) , 

where Hm — xq := {x — xq: x £ Hm}- 

We now show that the pseudodimension of Hm, which clearly equals the 
pseudodimension of Hm — xq, is less than or equal to ?m(i log (?m) where 
? = 2/ log 2. An application of Theorem B.2 would then complete the proof. 
Note that the quantity B in the statement of Theorem B.2 can be taken to 
be -y/nw because |aj| < ^/nuj for every (ai, . . . , a„) G Bn{0, ^/noj). 

Every L in Vm is of the form conv(S') for some S C M'^ with cardinality at 
most m and thus hiiu) = maxa;g5(x • u). Therefore H\ is a linear subspace 
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of with dimension at most d which imphes (see Pollard [23], page 15) 
that the pseudodimension of Hi is at most d. The pseudodimension of Hm, 
which consists of coordinatewise maxima of at most m points in Hi, can 
then be bounded from above following the argument in Pollard [23], proof 
of Lemma (5.1). Indeed, that argument shows that the pseudodimension 
of Hm is bounded from above by the smallest positive integer k for which 

(45) (j)+...+ (^)<2V™ 

If 53 denotes a binomial random variable with parameters k and 1/2, then 
the left-hand side of the above inequality equals 2'^P{5S > k — d} and can 
be bounded, again following Pollard [23], proof of Lemma (5.1), as shown 
below. For every q > 0, 

(46) 2¥{5S >k-d}< 2¥{a'^ > a''^'^} > 2^a'^~^Ea'^ = (1 + a)'=a'^-^ 

If we now choose a = (2^/^^™') — 1)~^, then (1 + a)/a = 2^/^'^"^^ and also, ap- 
plying the inequality x — 1 > logx to x = 2^/^^™), we get that a < 2?n/(log2). 
Therefore, from (46), we have 

2¥{*B >k-d}<( ^] '^2^/(2'") . 

Viog2y 

The following inequality therefore ensures that k satisfies (45): 

(l^) or, equivalently A: > ?mdlog(?m), 

where ^ = 2/ (log 2). It follows, therefore, that the pseudodimension of Hm 
is at most <;m(ilog(?m). The proof of Lemma B.l is complete. □ 



APPENDIX C 

In this section, we prove the following result which was used in the proof 
of Theorem 4.3. It relates the risks under the loss functions i'j and £'^. Its 
proof is based on standard empirical process arguments due to Pollard [22] . 
We have also borrowed ideas from Gyorfi, Kohler, Krzyzak and Walk [17], 
Chapter 11. 

Lemma C.l. There exists a universal constant c such that the following 
inequality holds for every m>l, P > and K G Vm{^)- 

(47) sup {ir{L,K)-2lf{L,K))\<c—md\og{cn). 

Proof. For x > 0, let 

Q^:=^A sup {£r{L,K)-2lf{L,K))l>x\. 
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Letting T := {/il — : L G Pm(r)}, we have the trivial inequahty 

1/2 



Qx<^y{ sup 



^1/2- 









Each function in J-" is bounded in absolute value by 2T. We now use Gyorfi et 
al. [17], Theorem 11.2, which is a slight variation of Pollard [22], Lemma 33 
and Problem 24, to obtain 



(48) 



& < 4E, min(^l, M(n,(r), ^/24; if) exp(^-^^^^ ^ . 



Lemma B.l with uj = T , e = y/x/2A and P = {0} provides an upper bound for 
the covering number M('Pm(r), \/x/24; £j). The set VmiX) is much smaller 
than Sm{P,^^), however, and the following direct argument gives the simpler 
upper bound: 

/ 48r\'"'^ 

(49) M{Vrn{r),^/^/2A;ef)<M{V„^{T),V^/2A■,lH)<[l + ^] . 

To see this, let C be an e := -y/rE/24-covering subset of 3^(0, T) under the 
Euclidean metric, dd- Then S) := {conv(5) : 5 C (J with |5| < m} forms an 
e-covering subset of Pm(r) under £h- Indeed, K = convjai, . . . , Om} with 
ai,...,am G B(i{0,T) is an arbitrary element set in Pm(r), then we can 
choose a[, . . . ,a'^ £ (t with dd{ai,a'j) < e for each i. It is then easy to see 
[using (11)] that the Hausdorff distance between K and convja'^, . . . ,a'^} is 
at most e. It is evident that the cardinality of is at most A standard 
volumetric argument shows that the cardinality of C can be taken to be 
smaller than (1 + (2r/e))'^ which proves (49). 
Combining (48) and (49), we get 

The left-hand side of (47) can now be bounded, for every A > 0, by 

/■A foo 

Qxdx= I Qxdx+ I dx 



< 4A + 4 / 1 + exp dx 

Ja \ V^J V 2304r2; 

<4A + 9216-(l + -^j exp(-^j. 



The required bound (47) is now easily deduced by using the above inequality 
with 

p2 

A:=2304— mdlog(l + 48^/^). n 

n 
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